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A variety of logical frameworks support the use of higher-order abstract syntax in representing formal 
systems; however, each system has its own set of benchmarks. Even worse, general proof assistants 
that provide special libraries for dealing with binders offer a very limited evaluation of such libraries, 
and the examples given often do not exercise and stress-test key aspects that arise in the presence of 
binders. In this paper we design an open repository ORBI (Open challenge problem Repository for 
systems supporting reasoning with Binders). We believe the field of reasoning about languages with 
binders has matured, and a common set of benchmarks provides an important basis for evaluation 
and qualitative comparison of different systems and libraries that support binders, and it will help to 
advance the field. 


1 Introduction 

A variety of logical frameworks support the use of higher-order abstract syntax (HOAS) in representing 
formal systems; however, each system has its own set of benchmarks, often encoding the same object 
logics with minor differences. Even worse, general proof assistants that provide special libraries for 
dealing with binders often offer only a very limited evaluation of such libraries, and the examples given 
often do not exercise and stress-test key aspects that arise in the presence of binders. 

The PoplMark challenge 0 was an important milestone in surveying the state of the art in mech¬ 
anizing the meta-theory of programming languages. We ourselves proposed several specific bench¬ 
marks liSl that are crafted to highlight the differences between the designs of various meta-languages 
with respect to reasoning with and within a context of assumptions, and we compared their implemen¬ 
tation in four systems: the logical framework Twelf 1(2^ . the dependently-typed functional language 
Beluga ifTTlfT^ . the two-level Hybrid system ||6l|T5l as implemented on top of Coq and Isabelle/HOL, 
and the Abella system ifTOl . Finally, several systems that support reasoning with binders, in particular 
systems concentrating on modeling binders using HOAS, also provide a large collection of examples and 
case studies. For example, Twelf’s wiki (http: //twelf . org/wiki/Case_studies), Abella’s library 
(http: //abella-prover .org/examples). Beluga’s distribution, and the Coq implementation of Hy¬ 
brid (http: //www. site .uottawa. ca/-af elty/HybridCoq/) contain sets of examples that highlight 
the many issues surrounding binders. 

As the field matures, we believe it is important to be able to systematically and qualitatively evaluate 
approaches that support reasoning with binders. Having benchmarks is a first step in this direction. 
In this paper, we propose a common infrastructure for representing challenge problems and a central, 
^en challenge problem Repository for systems supporting reasoning with Binders (ORBI) for sharing 
benchmark problems based on the notation we have developed. 
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ORBI is designed to be a human-readable, easily machine-parsable, uniform, yet flexible and exten¬ 
sible language for writing specifications of formal systems including grammar, inference rules, contexts 
and theorems. The language directly upholds HOAS representations and is oriented to support the mech¬ 
anization of benchmark problems in Twelf, Beluga, Abella, and Hybrid, without hopefully precluding 
other existing or future HOAS systems. At the same time, we hope it also is amenable to translations to 
systems using other representation techniques such as nominal ones. 

We structure the language in two parts: 

1. the problem description, which includes the grammar of the object language syntax, inference 
rules, context schemas, and context relations 

2. the logic language, which includes syntax for expressing theorems and directives to ORB12 aQ 
tools. 

We begin in Sect. |2] with a running example. We consider the untyped lambda-calculus as an object 
logic (OL), and present the syntax, some judgments, and sample theorems. In Sect. |3l we present ORBI 
by giving its grammar and explaining how it is used to encode our running example; Sect. 13.11 and 
Sect. 13.21 present the two parts of this specification as discussed above. We discuss related work in 
Sect, m 

We consider the notation that we present here as a first attempt at defining ORBI (Version 0.1), where 
the goal is to cover the benchmarks considered in fFj. As new benchmarks are added, we are well aware 
that we will need to improve the syntax and increase the expressive power—we discuss limitations and 
some possible extensions in Sect. |5] 

2 A Running Example 

The first question that we face when defining an OL is how to describe well-formed objects. Consider 
the untyped lambda-calculus, defined by the following grammar: 

M ::= x\ lamx.M I appMi M 2 . 

To capture additional information that is often useful in proofs, such as when a given term is closed, 
it is customary to give inference rules in natural deduction style for well-formed terms using hypothet¬ 
ical and parametric judgments. However, it is often convenient to present hypothetical judgments in a 
localized form, reducing some of the ambiguity of the two-dimensional natural deduction notation, and 
providing more structure. We therefore introduce an explicit context for bookkeeping, since when estab¬ 
lishing properties about a given system, it allows us to consider the variable case(s) separately and to state 
clearly when considering closed objects, i.e., an object in the empty context. More importantly, while 
structural properties of contexts are implicitly present in the natural deduction presentation of inference 
rules (where assumptions are managed informally), the explicit context presentation makes them more 
apparent and highlights their use in reasoning about contexts. 

istmx^r r,is_tmxl-is_tmM rhisAmMi rhisAmMo 

rhis-tmx r h is_tm (lamx.M) F h is_tm (appMi M2) ^ 


'Following TPTP’s nomenclature 1251 . we call “ORBI2X” any tool taking an ORBI specification as input; for example, 
the translator for Hybrid described in GU turns syntax, inference rules, and context definitions of ORBI into input to the Coq 
version of Hybrid, and it is designed so that it can be adapted fairly directly to output Abella scripts. 
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Traditionally, a context of assumptions is characterized as a sequence of formulas Ai,A 2 ,...,A„ 
listing its elements separated by commas ifT^IT^ . In fTl, we argue that this is not expressive enough 
to capture the structure present in contexts, especially when mechanizing object logics, and we define 
context schemas to introduce the required extra structure: 

Atom A 

Block of declarations D ::= A | D;A 
Context r ::= • | r,D 
Schema S ::= Ds\Ds + S 

A context is a sequence of declarations D where a declaration is a block of individual atomic assumptions 
separated by The binds tighter than We treat contexts as ordered, i.e., later assumptions in the 
context may depend on earlier ones, but not vice versa—this in contrast to viewing contexts as multi-sets. 
Just as types classify terms, a schema will classify meaningful structured sequences. A schema consists 
of declarations D^, where we use the subscript s to indicate that the declaration occurring in a concrete 
context having schema S may be an instance of D^. We use + to denote the alternatives in a context 
schema. For well-formed terms, contexts have a simple structure where each block contains a single 
atom, expressed as the following schema declaration: 

Sx := is_tm v. 

We write <t>x to represent a context satisfying schema Sx (and similarly for other context schemas appear¬ 
ing in this paper). Informally, this means that <I>jc has the form is_tm xi,..., is_tm x„ where n>0 and 
x\,... ,x„ are distinct variables. (See |*T| for a more formal account.) 

For our running example, we consider two more simple judgments. The first is algorithmic equality 
for the untyped lambda-calculus, written (aeq M A). We say that two terms are algorithmically equal 
provided they have the same structure with respect to the constructors. 

aeqxxGF F, is_tm x;aeq xx h aeq M A FhaeqMiAi ri-aeqM 2 A 2 

Fhaeqxx F h aeq (lamx.M) (lamx.A) F h aeq (appMi M 2 ) (app Ai A 2 ) 

The second is declarative equality written (deq M A), which includes versions of the above three rules 
called dev, deu and dea, where aeq is replaced by deq everywhere, plus reflexivity, symmetry and tran¬ 
sitivity shown belowH 


FhdeqAM FhdeqML FhdeqLA 
FhdeqMM FhdeqMA FhdeqMA 

These judgments give rise to the following schema declarations: 

Sxa '■= is_tmx;aeqxx 

Sxd '■= is_tmx;deqxx 

Sda ■= is_tm x;deq xx;aeq XX 

^This is an oversimplification, since there are well-known specifications where contexts have more structure, see the solution 
to the PoplMark challenge in m and the examples in (2£\ . In fact, those are already legal ORBI specs. 

^We acknowledge that this definition of declarative equality has a degree of redundancy: the assumption deq xx in rule de/ 
is not needed, since rule der plays the variable role. However, this formulation exhibits issues, such as context subsumption, 
that would otherwise require more complex benchmarks. 
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The first two come directly from the aei and dei rules where declaration blocks come in pairs. The third 
combines the two, and is used below in stating one of the example theorems. 

When stating properties, we often need to relate two judgments to each other, where each one has 
its own context. For example, we may want to prove statements such as “if ^ J\ then ^xa 1“ Ji” The 
proofs in ||3 use two approaches0 In the first, the statement is reinterpreted in the smallest context that 
collects all relevant assumptions; we call this the generalized context approach (G). The above statement 
becomes “if ^xa ^ J\ then ^xa 1“ Ji” As an example theorem, we consider the completeness of declarative 
equality with respect to algorithmic equality, of which we only show the interesting left-to-right direction. 

Theorem 2.1 (Completeness, G Version ) 

Admissibility of Reflexivity If ^xa F is_tm M then ^xa F aeq M M. 

Admissibility of Symmetry If ^xa F aeq M A then <t>xa F aeq N M. 

Admissibility of Transitivity If^xa F aeq M A and ^xa F aeq A L then ^xa F aeq M L. 

Main Theorem If ^da F deq M A then ^da F aeq M A. 

In the second approach, we state how two (or more) contexts are related via context relations. For 
example, the following relation captures the fact that is_tm x will occur in ^x in sync with an assumption 
block containing is_tm x; aeq xxin ^xa- 

_ ^ ^xa _ 

. ~ . is-tm X ~ is-tm x;aeq XX 


Similarly, we can define <I>xa ^xd- 

_ ^xa ^ ^xd _ 

. ~ . ^xa, is_tm x; aeq X X ~ ^xd, is_tm x; deq x x 

We call this the context relations approach (R). The theorems are then typically stated as: “if ^x F Ji and 
^x ~ ^xa then ^xa F J 2 ” We can then revisit the completeness theorem for algorithmic equality together 
with the necessary lemmas as follows. 

Theorem 2.2 (Completeness, R Version) 

Admissibility of Reflexivity Assume ^x ~ ^xa- If^x^ 'SJ:mM then ^xa F aeq M M. 

Admissibility of Symmetry If ^xa F aeq M A then <t>xa F aeq AM. 

Admissibility of Transitivity If ^xa F aeq M A and ^xa F aeq A L then ^xa F aeq M L. 

Main Theorem Assume ^xa ~ '^xd- If^xd F deq M A then ^xa F aeq M A. 

3 ORBI 

ORBl aims to provide a common framework for systems that support reasoning with binders. Cur¬ 
rently, our design is geared towards systems supporting HOAS, where there are (currently) two main 
approaches. On one side of the spectrum we have systems that implement various dependently-typed 

^In proofs on paper, the differences between the two approaches usually do not appear; they are present in the details that 
are left implicit, but must be made explicit when mechanizing proofs. For example, on-paper versions of the admissibility of 
reflexivity that make these distinctions explicit appear in (2l proofs of Theorems 7 and 8. 
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calculi. Such systems include Twelf, Beluga, and Delphin 1201 . All these systems also provide, to var¬ 
ious degrees, built-in support for reasoning modulo structural properties of a context of assumptions. 
These systems support inductive reasoning over terms as well as rules. Often it is more elegant in these 
systems to state theorems using the G-version [^]. 

On the other side there are systems based on a proof-theoretic foundation, which typically follow a 
two-level approach: they implement a specification logic (SL) inside a higher-order logic or type the¬ 
ory. Hypothetical judgments of object languages are modeled using implication in the SL and parametric 
judgments are handled via (generic) universal quantification. Contexts are commonly represented explic¬ 
itly as lists or sets in the SL, and structural properties are established separately as lemmas. For example 
substituting for an assumption is justified by appealing fo fhe cuf-admissibilify lemma of fhe SL. These 
lemmas are nof direcfly and infrinsically supporfed fhrough fhe SL, buf may be infegrafed info a sysfem’s 
aufomafed proving procedures, usually via facfics. Induction is usually only supporfed on derivations, 
buf nof on ferms. Sysfems following fhis philosophy include Hybrid and Abella. Often fhese sysfems are 
beffer suited fo proving R-versions of theorems. 

The desire for ORBI to cater to both type and proof theoretic frameworks requires an almost impossi¬ 
ble balancing act between the two views. For example, contexts are first-class and part of the specification 
language in Beluga; in Twelf, schemas for contexts are part of the specification language, which is an 
extension of LF, but users cannot explicitly quantify over contexts and manipulate them as first-class ob¬ 
jects; in Abella and Hybrid, contexts are (pre)defined using inductive definitions on the reasoning level. 
We will describe next our common infrastructure design, directives, and guidelines that allow us to cater 
to existing systems supporting HOAS. 

3.1 Problem Description in ORBI 

ORBFs language for defining fhe grammar of an objecf language fogefher wifh inference rules is based 
on fhe logical framework LF; pragmafically, we have adopfed fhe concrete synfax of LF specificafions 
in Beluga, which is almosf identical fo Twelf’s. The advanfage is fhaf specificafions can be direcfly 
parsed and more imporfanfly fype checked by Beluga, fhereby eliminating many synfacfically correcf buf 
meaningless expressions. 

Objecf languages are wriffen according fo fhe EBNF grammar in Fig.[T] which uses cerfain sfandard 
conventions: {a} means repeaf a producfion zero or more fimes, and commenfs in fhe grammar are 
enclosed befween (* and *). The foken id refers fo identifiers sfarfing wifh a lower or upper case leffer. 
These grammar rules are basically fhe sfandard ones used bofh in Twelf and Beluga and we do nof discuss 
them in detail here. We only note that while the presented grammar permits general dependent types up 
to level n, ORBI specifications will only use level 0 and level 1. Intuitively, specifications at level 0 
define fhe synfax of a given objecf language, while specificafions af level 1 (i.e., fype families fhaf are 
indexed by ferms of level 0) describe fhe judgmenfs and rules for a given OL. We exemplify fhe grammar 
relative fo fhe example of algorifhmic vs. declarative equalify. For more example specificafions, we refer 
fhe reader fo our survey paper |8] or fo https: //github. coni/pientka/ORBI@ 

Syntax An ORBI file sfarfs in fhe Syntax section wifh fhe declarafion of fhe consfanfs used fo encode 
fhe synfax of fhe OL in quesfion, here unfyped lambda-ferms, which are infroduced wifh fhe declarations: 

^The observant reader will have noticed that ORBI’s concrete syntax for schemas differs from the one that we have presented 
in Sect.|2l in so much that blocks are separated by commas and not by semi-colons. This is forced on us by our choice to re-use 
Beluga’s parsing and checking tools. 
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sig : 

: := {decl 

(* 

declaration *) 


1 s_decl} 

(* 

schema declaration *) 

decl : 

: := id ":" tp " ." 

(* 

constant declaration *) 


1 id " :" kind "." 

(* 

type declaration *) 

op_arrow : 

. . — n 11 1 11 11 

(* 

A <- B same as B -> A *) 

kind : 

: : = type 




1 tp op_arrow kind 

(* 

A -> K *) 


1 "{" id tp ">" kind 

(* 

Pi x:A.K *) 

tp : 

: := id {term} 

(* 

a Ml ... M2 *) 


1 tp op_arrow tp 

1 "{" id tp ">" tp 

(* 

Pi x:A.B *) 

term : 

: := id 

(* 

constants, variables *) 


1 "\" id term 

(* 

lambda x. M *) 


1 term term 

(* 

M N *) 

s_decl 

::= schema s_id "=" alt_blk 

. M 
» 


s_id 

: := id 



alt_blk 

::= blk {"+" blk} 



blk 

::= block id tp {"," id " 

tp} 


Figure 1: ORBI grammar for syntax, judgments, inferenee rules, and context schemas 

°/o°/o Syntax 
tm: type. 

app: tm -> tm -> tm. 
lam: (tm -> tm) -> tm. 

The declaration introducing type tm along with those of the constructors app and lam fully specify the 
syntax of OL terms. We represent binders in the OL using binders in the HOAS meta-language. Hence 
the constructor lam takes in a function of type tm -> tm. Forexample, theOLterm (lamx. lamy. appty) 
is represented as lam (\x. lam (\y. app x y)), where “\” is the binder of the metalanguage. Bound 
variables found in the object language are not explicitly represented in the meta-language. 

Judgments and Rules These are introduced as LF type families (predicates) in the Judgments sec¬ 
tion followed by object-level inference rules for these judgments in the Rules section. In our running 
example, we have two judgments: 

7o7o Judgments 

aeq: tm -> tm -> type. 
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deq: tm -> tm -> type. 

Consider first the inference rule for algorithmic equality for application, where the ORBI text is a straight¬ 
forward encoding of the rule: 

ae_a; aeq Ml N1 -> aeq M2 N2 T h aeq Mi M T h aeq M 2 N 2 

-> aeq (app Ml M2) (app N1 N2). F h aeq (app Mi M 2 ) (app A^l A^2) 

Uppercase letters such as Ml denote schematic variables, which are implicitly quantified at the outermost 
level, namely {Ml :tm}, as is commonly done for readability purposes in Twelf and Beluga. 

The binder case is more interesting: 

ae_l: ({x:tm} aeq x x -> aeq (M x) (N x)) F, is_tm x, aeq xx\- aeq M N 

-> aeq (lam (\x. M x)) (lam (\x. N x)). F h aeq (lamx.M) (lamx.A^) 

We view the is_tm x assumption as the parametric assumption x:tni, while the hypothesis aeq xx (and 
its scoping) is encoded within the embedded implication aeq x x -> aeq (M x) (N x) in the cur¬ 
rent (informal) signature augmented with the dynamic declaration for x. As is well known, parametric 
assumptions and embedded implication are unified in the type-theoretic view. Note that the “variable” 
case, namely rule aev, is folded inside the binder case. We list here the rest of the Rules section: 

7,7, Rules 

de_a: deq Ml Ml -> deq M2 M2 -> deq (app Ml M2) (app Ml M2). 
de_l: ({xitm} deq x x -> deq (M x) (M x)) 

-> deq (lam (\x. M x)) (lam (\x. M x)). 
de_r: deq M M. 
de_s: deq M M -> deq M M. 
de_t: deq M L -> deq L M -> deq M M. 

Schemas A schema declaration s_decl is introduced using the keyword schema. A blk consists of 
one or more declarations and alt_blk describes alternating schemas. For example, schemas mentioned 
in Sect. |2] appear in the Schemas section as: 

7,7, Schemas 

schema xG = block (x:tm); 

schema xaG = block (x:tm, u:aeq x x); 

schema xdG = block (x:tm, u:deq x x); 

schema daG = block (x:tm, u:deq x x, v:aeq x x); 

To illustrate alternatives in contexts, consider extending our OL to the polymorphically typed lambda- 
calculus, which includes a new type tp in the Syntax section, and a new judgment: 

atp: tp -> tp -> type. 

representing equality of types in the Judgments section (as well as type constructors and rules for well- 
formed types and type equality, omitted here). With this extension, the following two examples replace 
the first two schemas in the Schemas section. 

schema xG = block (x:tm) + block (a:tp); 

schema xaG = block (x:tm, u:aeq x x) + block (a:tp, v:atp a a); 
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While we type-check the schema definitions using an extension of the LF type checker (as imple¬ 
mented in Beluga), we do not verify that the given schema definition is meaningful with respect to the 
specification of the syntax and inference rules; in other words, we do not perform “world checking” in 
Twelf lingo. 


Definitions So far we have considered the specification language for encoding formal systems. ORBI 
also supports declaring inductive definitions for specifying context relations. We start with the gram¬ 
mar for inductive definitions (Fig. |2ll. Although we plan to provide syntax for specifying more general 
inductive definitions, in this version of ORBI we only define context relations inductively, that is n-ary 
predicates between contexts of some given schemas. Hence the base predicate is of the form id {ctx} 
relating different contexts. For example, the Definitions section defines the relations <I>x ^xa and 


def_dec ; 

: := "inductive 

" id " 

:" r_kind "=" def_body 

r_kind ; 

::= "prop" 




1 id 

s_id 

">" r_kind 

def_body ; 

•H 

II 

def_prp {def_body} 

def_prp ; 

: := id {ctx} 




1 def_prp 

>" def 

_prp 

ctx ; 

::= □ 1 [id] 

1 ctx 

"," id ":" blk 


Figure 2: ORBI grammar for inductive definitions describing context relations 

^xa r\j ^xd- To illustrate, only the former is shown below. 

7o7o Definitions 

inductive Rxa : {g:xG} {h:xaG} prop = 

I Rxa_nl: Rxa [] [] 

I Rxa_cs: Rxa [g] [h] 

-> Rxa [g, b: block (x:tm)] [h, b: block (x:tni, u;aeq x x)] ; 

This kind of relation can be translated fairly directly to inductive n-ary predicates in systems support¬ 
ing the proof-theoretic view. In the type-theoretic framework underlying Beluga, inductive predicates 
relating contexts correspond to recursive data types indexed by contexts; in fact ORBI adopts Beluga’s 
concrete syntax, so as to directly type-check those definitions as well. Twelf’s type theoretic framework, 
however, is not rich enough to support inductive definitions. 

3.2 Theorems and Directives in ORBI 

While the elements of an ORBI specification detailed in the previous subsection were relatively easy to 
define in a manner that is well understood by all the different systems we are targeting, we illustrate in 
this subsection those elements that are harder to describe uniformly due to the different treatment and 
meaning of contexts in the different systems. 
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Theorems We list the grammar for theorems in Fig. [3] Our reasoning language includes a category prp 
that specifies the logical formulas we support. The base predicates include false,true, term equality, 
atomic predicates of the form id {ctx}, which are used to express context relations, and predicates of 
the form [ctx | - J], which represent judgments of an object language within a given context. Connec¬ 
tives and quantifiers include implication, conjunction, disjunction, universal and existential quantification 
over terms, and universal quantification over context variables. 


thm 

prp 


quantif 


= "theorem" id ";" prp ";" 

= id {ctx} 

I " [" ctx id {term} "] 

I term "=" term 
I false 
I true 

I prp prp 
I prp "M" prp 
I prp prp 

I quantif prp 

= "{" id s_id "}" 

I "{" id tp "}" 

I "<" id tp ">" 


(* Context relation *) 

(* Judgment in a context *) 

(* Term equality *) 

(* Falsehood *) 

(* Truth *) 

(* Conjunction *) 

(* Disjunction *) 

(* Implication *) 

(* Quantification *) 

(* universal over contexts *) 
(* universal over terms *) 

(* existential over terms *) 


Figure 3: ORBI grammar for theorems 


To illustrate, the reflexivity lemmas and completeness theorems for both the G and R versions as 
they appear in the Theorems section are shown below. These theorems are a straightforward encoding 
of those stated in Sect. |2] 

7o7o Theorems 

theorem reflG: {h:xaG}{M:tm} [h |- aeq M M] ; 

theorem ceqG: {g:daG}{M:tm}{N:tm} [g |- deq M N] -> [g |- aeq M N]; 


theorem reflR: {g:xG}{h:xaG}{M:tm} Rxa [g] [h] -> [h |- aeq M M]; 

theorem ceqR: {g:xdG}{h:xaG}{M:tm}{N:tm} Rda [g] [h] -> 

[g I - deq M N] -> [h |- aeq M N]; 

As mentioned, we do not type-check theorems; in particular, we do not define the meaning of 
[ctx |- J], since several interpretations are possible. In Beluga, every judgment J must be mean¬ 
ingful within the given context ctx; in particular, terms occurring in the judgment J must be meaningful 
in ctx. As a consequence, both parametric and hypothetical assumptions relevant for establishing the 
proof of J must be contained in ctx. Instead of the local context view adopted in Beluga, Twelf has 
one global ambient context containing all relevant parametric and hypothetical assumptions. Systems 
based on proof-theory such as Hybrid and Abella distinguish between assumptions denoting eigenvari- 
ables (i.e., parametric assumptions), which live in a global ambient context and proof assumptions (i.e., 
hypothetical assumptions), which live in the context ctx. While users of different systems understand 
how to interpret [ctx | - J], reconciling these different perspectives in ORBI is beyond the scope of 
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this paper. Thus for the time being, we view theorem statements in ORBI as a kind of comment, where it 
is up to the user of a partieular system to determine how to translate them. 

Directives In ORBI, directives are comments that help the ORBI2X tools to generate target represen¬ 
tations of the ORBI specifications. The idea is reminiscent of what Ott |[24ll does to customize certain 
declarations, e.g., the representation of variables, to the different programming languages/proof assis¬ 
tants it supports. The grammar for directives is listed in Fig. IH 


dir 

::= ’V 

1 ’U 

sy_set what decl decl} {dest} 

7 sepr ’.’ 

sy_id 

II 

1 ab 1 bel | tw 

sy_set 


sy_id {’,’ sy_id} ’ 

what 

: := wf 

1 explicit 1 implicit 

dest 

::= ’in’ 

ctx 1 ’in’ s_id | ’in’ id 

sepr 

::= Syntax | Judgments | Rules | Schemas | Definitions 


I Directives | Theorems 


Figure 4: ORBI grammar for directives 

The sepr directives, such as Syntax, are simply means to structure ORBI specifications. Most 
of the other directives that we consider in this version of ORBI are dedicated to help the translations 
into proof-theoretical systems, although we also include some to facilitate the translation of theorems to 
Beluga. The set of directives is not intended to be complete and the meaning of directives is system- 
specific. The directives wf and explicit are concerned with the asymmetry in the proof-theoretic view 
between declarations that give typing information, e.g., tm:type, and those expressing judgments, e.g., 
aeq:tm -> tm -> type. In Abella and Hybrid, the former may need to be reified in a judgment, in 
order to show that judgments preserve the well-formedness of their constituents, as well as to provide 
induction on the structure of terms; yet, in order to keep proofs compact and modular, we want to 
minimize this reification and only include them where necessary. The Directives section of our sample 
specification includes, for example, 

7o [hy,ab] wf tm. 

which refers to the first line of the Syntax section where tm is introduced, and indicates that we need 
a predicate (e.g., is_tm) to express well-formedness of terms of type tm. Formulas expressing the 
definition of this predicate are automatically generated from the declarations of the constructors app and 
lam with their types. 

The keyword explicit indicates when such well-formedness predicates should be included in the 
translation of the declarations in the Rules section. For example, the following formulas both represent 
possible translations of the ae_l rule to proof-theoretic systems. We use Abella’s concrete syntax to 
exemplify: 

aeq (lam M) (lam N) pi x\ is_tm x => aeq x x => aeq (M x) (N x). 

aeq (lam M) (lam N) pi x\ aeq x x => aeq (M x) (N x). 
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where the typing information is explieit in the first and implieit in the seeond. By default, we ehoose the 
latter, that is well-formed judgments are assumed to be implicit, and require a direetive if the former is 
desired. Consider, for example, that we want to eonelude that whenever a judgment is provable, the terms 
in it are well-formed, e.g., if aeq M N is provable, then so are is_tm M and is_tni N. Sueh a lemma 
is indeed provable in Abella and Hybrid from the implicit translation of the rules for aeq. Proving 
a similar lemma for the deq judgment, on the other hand, requires some strategieally plaeed explieit 
well-formedness information. In partieular, the two direetives: 

7o [hy,ab] explicit (x : tm) in de_l. 

7o [hy,ab] explicit (M : tm) in de_r. 

require the elauses de_l and de_r to be translated to the following formulas: 

deq (lam M) (lam N) pi x\ is_tm x => deq x x => deq (M x) (N x). 
deq MM:- is_tm M. 

The ease for sehemas is analogous. In the systems based on proof-theoretie approaehes, eontexts 
are typieally represented using lists and sehemas are translated to unary induetive predieates that verify 
that these lists have a partieular regular strueture. We again leave typing information implieit in the 
translation unless a direetive is ineluded. For example, the xaG sehema with no assoeiated direetive will 
be translated to the following induetive definition in Abella: 

Define aG : olist -> prop by 
xaG nil; 

nabla x, xaG (aeq x x :: As) := xaG As. 

The direetive 7o [hy,ab] explicit (x : tm) in daG will yield this Hybrid definition: 

Inductive daG : list atm -> Prop := 

I nil_da : daG nil 

I cns_da : forall (Gamma:list atm) (x:uexp), 

proper x -> daG Gamma -> daG (is_tm x :: deq x x :: aeq x x :: Gamma). 

Similarly, directives in context relations, such as: 

7o [hy,ab] explicit (x : tm) in g in Rxa. 

also state which well-formedness annotations to make explicit in the translated version. In this case, 
when translating the definition of Rxa in the Definitions section, they are to be kept in g, but skipped 
in h. 

Keeping in mind that we consider the notion of directive open to cover other benchmarks and differ¬ 
ent systems, we offer some speculation about directives that we may need to translate theorems for the 
examples and systems that we are considering. For example, theorem ref IG is proven by induction over 
M. As a consequence, M must be explicit. 

7o [hy,ab,bel] explicit (M : tm) in h in reflG. 

The ORBI2Hybrid and ORBI2Abella tools will interpret the directive by adding an explicit assumption, 
as illustrated by the result of the ORBI2Abella translation: 

forall H M, xaG H -> {H |- is_tm M> -> {H |- aeq M M}. 

In Beluga, the directive is interpreted as: 

{h:xaG} {M:[h |- tm]} [h |- aeq M M]. 
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where M will have type tm in the context h. Moreover, since the term M is used in the judgment aeq 
within the context h, we associate M with an identity substitution, which is not displayed. In short, the 
directive allows us to lift the type specified in ORBI to a contextual type that is meaningful in Beluga. In 
fact, Beluga always needs additional information on how to interpret terms—are they closed or can they 
depend on a given context? For translating symG for example, we use the following directive to indicate 
the dependence on the context: 

°/o [bel] implicit (M : tm), (N : tm) in h in symG. 

3.3 Guidelines 

In addition, we introduce a set of guidelines for ORBI specification writers, with the goal of helping 
translators generate output that is more likely to be accepted by a specific sysfem. ORBI 0.1 includes 
four such guidelines, which are mofivafed by fhe desire fo avoid puffing too many consfrainfs in fhe 
grammar rules. Firsf, as we have seen in our examples, we use as a convenfion fhaf free variables which 
denofe schematic variables in rules are written using upper case identifiers; we use lower case idenfifiers 
for eigenvariables in rules and for confexf variables. Second, while fhe grammar does nof resfricf whaf 
fypes we can quanfify over, fhe intention is fhaf we quanlify over types of level-0, i.e., objecfs of fhe 
synfax level, only. Third, in order to more easily accommodafe sysfems wifhouf dependenf fypes. Pi 
should nof be used when wrifing non-dependenf types; an arrow should be used insfead. (In LF, for 
example, A -> B is an abbreviation for Pi x:A.B for fhe case when x does nof occur in B. Following 
fhis guideline means favoring Ibis abbreviafion whenever if applies.) Fourfh, when wrifing a confexf 
(grammar ctx), disfincf variable names should be used in differenf blocks. 


4 Related Work 

Our approach fo sfrucfuring confexls of assumpfions lakes ils inspiralion from Marlin-Ldf’s Iheory of 
judgmenls, especially in fhe way if has been realized in Edinburgh LF. However, our formulafion owes 
more fo Beluga’s fype fheory, where confexls are firsl-class cilizens, lhan fo fhe nofion of regular world 
in Twelf. 

The crealion and sharing of a library of benchmarks has proven fo be very beneficial to fhe field 
if represenls. The brighfesl example is TPTP |j25l, whose influence on fhe developmenl, lesling and 
evaluafion of aulomaled Iheorem provers cannol be undereslimafed. Clearly our ambitions are much 
more limiled. We have also faken some inspiration from ils higher-order extension THFO |21|, in particular 
in ifs conslruclion in sfages. 

The success of TPTP has spurred olher benchmark suifes in relafed subjecls, see for example SATLIB 
|fT4]l : however, fhe only one concerned wilh induction is fhe Induction Challenge Problems (http:// 
www.es .nott. ac.uk/~lad/research/challenges), a collection of examples geared fo fhe automa¬ 
tion of induclive proof. The benchmarks are faken from arilhmefic, puzzles, functional programming 
specifications, elc. and as such have little conneclion wilh our endeavor. On fhe olher hand, fhe exam¬ 
ples menfioned earlier coming from Twelf’s wiki, Abella’s library. Beluga’s disfribufion, and Hybrid’s 
web page confain a sef of examples fhaf highlighf fhe issues around binders. As such Ihey are prime 
candidates fo be included in ORBI. 

Olher projeefs have puf forward LF as a common ground: fhe goal of Logosphere's (http: //www. 
logosphere.org) was fhe design of a represenlalion language for logical formalisms, individual Iheo- 
ries, and proofs, wilh an interface fo olher Iheorem proving systems fhaf were somewhal connected, bul 
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the project never materialized. SASyLF |T1 originated as a tool to teach programming language theory: 
the user specifies the syntax, judgments, theorems and proofs thereof (albeit limited to closed objects) 
in a paper-and-pencil HOAS-friendly way and the system converts them to totality-checked Twelf code. 
The capability to express and share proofs is of obvious interest to us, although such proofs, being a lit¬ 
eral proof verbalization of the corresponding Twelf type family, are irremediably verbose. Finally, work 
on modularity in LF specifications ll^ is of critical interest to give more structure to ORBI files. 

Why3 (http://why3.lri.fr) is a software verification platform that intends to provide a front- 
end to third-party theorem provers, from proof assistants such as Coq to SMT-solvers. To this end 
Why3 provides a first-order logic with rank-1 polymorphism, recursive definitions, algebraic data types 
and inductive predicates f91, whose specifications are then translated to the several systems that Why3 
supports. Typically, those translations are forgetful, but sometimes, e.g., with respect to Coq, they add 
some annotations, for example to ensure non-emptiness of types. Although we are really not in the same 
business as Why3, there are several ideas that are relevant; to name one, the notion of a driver, that is, 
a configuration file to drive transformations specific to a system. Moreover, Why3 provides an API for 
users to write and implement their own drivers and transformations. 

Ott ll2^ is a highly engineered tool for “working semanticists,” allowing them to write programming 
language definitions in a style very close to paper-and-pen specifications; then those are compiled into 
FT^ and, more interestingly, into proof assistant code, currently supporting Coq, Isabelle/HOL, and 
HOL. Ott’s metalanguage is endowed with a rich theory of binders, but at the moment it favors the 
“concrete” (non a-quotiented) representation, while providing support for the nameless representation 
for a single binder. Conceptually, it would be natural to extend Ott to generate ORBI code, as a bridge 
for Ott to support HOAS-based systems. Conversely, an ORBI user would benefit from having Ott as a 
front-end, since the latter view of grammar and judgment seems at first sight general enough to support 
the notion of schema and context relation. 

In the category of environments for programming language descriptions, we mention PLT-Redex iSl 
and also the K framework (22)]. In both, several large-scale language descriptions have been specified 
and tested. However, none of those systems has any support for binders, let alone context specifications, 
nor can any meta-theory be formally verified. 

Finally, there is a whole research area dedicated to the handling and sharing of mathematical con¬ 
tent {MMK http: //www. mkm- ig. org) and its representation {OMDoc https: //trac. omdoc. org/ 
OMDoc), which is only very loosely connected to our project. 


5 Conclusion 

We have presented the preliminary design of a language, and more generally, of a common infrastructure 
for representing challenge problems for HOAS-based logical frameworks. The common notation allows 
us to express the syntax of object languages that we wish to reason about, as well as the context schemas, 
the judgments and inference rules, and the statements of benchmark theorems. 

We strongly believe that the field has matured enough to benefit from the availability of a set of 
benchmarks on which qualitative and hopefully quantitative comparison can be carried out. We hope 
that ORBI will foster sharing of examples in the community and provide a common set of examples. We 
also see our benchmark repository as a place to collect and propose “open” challenge problems to push 
the development of meta-reasoning systems. 

The challenge problems also play a role in allowing us, as designers and developers of logical frame¬ 
works, to highlight and explain how the design decisions for each individual system lead to differences 
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in using them in practice. Additionally, our benchmarks aim to provide a better understanding of what 
practitioners should be looking for, as well as help them foresee what kind of problems can be solved el¬ 
egantly and easily in a given system, and more importantly, why this is the case. Therefore the challenge 
problems provide guidance for users and developers in better comprehending differences and limitations. 
Finally, they serve as an excellent regression suite. 

The description of ORBl presented here is best thought of as a stepping stone towards a more compre¬ 
hensive specification language, much as THFO iS has been extended to the more expressive formalism 
THFi, adding for instance, rank-1 polymorphism. Many are the features that we plan to provide in the 
near future, starting from general (monotone) (co)inductive definitions; currently we only relate contexts, 
while it is clearly desirable to relate arbitrary well-typed terms, as shown for example in lU and ifTTI with 
respect to normalization proofs. Further, it is only natural to support infinite objects and behavior. How¬ 
ever, full support for (co)induction is a complex matter, as it essentially entails fully understanding the 
relationship between the proof-theory behind Abella and Hybrid and the type theory of Beluga. Once 
this is in place, we can “rescue” ORBl theorems from their current status as comments and even include 
proof sketches in ORBl. 

Clearly, there is a significant amount of implementation work ahead, mainly on the ORB12X tools 
side, but also on the practicalities of the benchmark suite. Finally, we would like to open up the repository 
to other styles of formalization such as nominal, locally nameless, etc. 


References 

[1] Jonathan Aldrich, Robert J. Simmons & Key Shin (2008): SASyLF: An Educational Proof Assistant for 
Language Theory. In: International Workshop on Functional and Declarative Programming in Education, 
ACM Press, pp. 31^0, doi:10.1145/1411260.1411266. 

[2] Brian E. Aydemir, Aaron Bohannon, Matthew Fairbairn, J. Nathan Foster, Benjamin C. Pierce, Peter 
Sewell, Dimitrios Vytiniotis, Geoffrey Washburn, Stephanie Weirich & Steve Zdancewic (2005): Mecha¬ 
nized Metatheory for the Masses: r/ie PoplMark Challenge. In: Eighteenth International Conference on 
Theorem Proving in Higher Order Logics, LNCS 3603, Springer, pp. 50-65, doi:10.1007/11541868_4. 

[3] Christoph Benzmiiller, Florian Rabe & Geoff Sutcliffe (2008): THFO—The Core of the TPTP Language 
for Higher-Order Logic. In: Fourth International Joint Conference on Automated Reasoning, LNCS 5195, 
Springer, pp. 491-506, doi:10.1007/978-3-540-71070-7_41. 

[4] Andrew Cave & Brigitte Pientka (2012): Programming with Binders and Indexed Data-Types. In: Thirty- 
Ninth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ACM Press, 
pp. 413^24, doi:10.1145/2103656.2103705. 

[5] Matthias Felleisen, Robert Bruce Findler & Matthew Flatt (2009): Semantics Engineering with PLT Redex. 
The MIT Press. 

[6] Amy P. Felty & Alberto Momigliano (2012): Hybrid: A Definitional Two-Level Approach to Reasoning 
with Higher-Order Abstract Syntax. Journal of Automated Reasoning 48(1), pp. 43-105, doi:10.1007/ 
S10817-010-9194-X. 

[7] Amy P. Felty, Alberto Momigliano & Brigitte Pientka (2015): The Next 700 Challenge Problems for Reason¬ 
ing with Higher-Order Abstract Syntax Representations: Part 1—A Common Infrastructure for Benchmarks. 
CoRR abs/1503.06095. Available at http: //arxiv.org/abs/1503.06095. 

[8] Amy P. Felty, Alberto Momigliano & Brigitte Pientka (2015): The Next 700 Challenge Problems for Reason¬ 
ing with Higher-Order Abstract Syntax Representations: Parti—A Survey. Journal of Automated Reasoning. 
(to appear). 


32 


ORBI 


[9] Jean-Christophe Filliatre (2013): One Logic To Use Them All. In: Twenty-Fourth International Conference 
on Automated Deduction, LNCS 7898, Springer, pp. 1-20, doi:10.1007/978-3-642-38574-2_l . 

[10] Andrew Gacek (2008): The Abella Interactive Theorem Prover (System Description). In: Fourth Inter¬ 
national Joint Conference on Automated Reasoning, LNCS 5195, Springer, pp. 154-161, doi:10.1007/ 
978-3-540-71070-7_13. 

[11] Andrew Gacek, Dale Miller & Gopalan Nadathur (2012): A Two-Level Logic Approach to Reasoning About 
Computations. Journal of Automated Reasoning 49(2), pp. 241-273, doi:10.1007/s 10817-011-9218-1. 

[12] J.-Y. Girard, Y. Lafont & P. Tayor (1990): Proofs and Types. Cambridge University Press. 

[13] Nada Habli & Amy P. Felty (2013): Translating Higher-Order Specifications to Coq Libraries Supporting 
Hybrid Proofs. In: Third International Workshop on Proof Exchange for Theorem Proving, EasyChair Pro¬ 
ceedings in Computing 14, pp. 61-lb. 

[14] Holger H. Hoos & Thomas Stiitzle (2000): SATLIB: An Online Resource for Research on SAT. In: SAT 2000: 
Highlights of Satisfiability Research in the Year 2000, Erontiers in Artificial Intelligence and Applications 63, 
lOS Press, pp. 283-292. 

[15] Alberto Momigliano, Alan J. Martin & Amy P. Felty (2008): Two-Level Hybrid: A System for Reasoning 
Using Higher-Order Abstract Syntax. In: Second International Workshop on Logical Erameworks and Meta- 
Languages: Theory and Practice, LEMTP 2007, ENTCS 196, Elsevier, pp. 85-93, doi: 10.1016/j . entcs. 
2007.09.019. 

[16] Brigitte Pientka (2007): Proof Pearl: The Power of Higher-Order Encodings in the Logical Framework LF. 
In: Twentieth International Conference on Theorem Proving in Higher-Order Logics, LNCS, Springer, pp. 
246-261, doi:10.1007/978-3-540-74591-4_19. 

[17] Brigitte Pientka & Andrew Cave (2015): Inductive Beluga.-Programming Proofs (System Description). In: 
Twenty-Fifth International Conference on Automated Deduction, Springer. 

[18] Brigitte Pientka & Joshua Dunfield (2010): Beluga: A Framework for Programming and Reasoning with 
Deductive Systems (System Description). In: Fifth International Joint Conference on Automated Reasoning, 
LNCS 6173, Springer, pp. 15-21, doi:10.1007/978-3-642-14203-l_2. 

[19] Benjamin C. Pierce (2002): Types and Programming Languages. MIT Press. 

[20] Adam Poswolsky & Carsten Schiirmann (2009): System Description: Delphin—A Functional Programming 
Language for Deductive Systems. In: Third International Workshop on Logical Frameworks and Meta- 
Languages: Theory and Practice (LEMTP 2008), ENTCS 228, Elsevier, pp. 113-120, doi: 10.1016/j . 
entcs.2008.12.120. 

[21] Elorian Rabe & Carsten Schiirmann (2009): A Practical Module System for LF. In: Fourth International 
Workshop on Logical Frameworks and Meta-Languages: Theory and Practice, ACM Press, pp. 40^8, 
doi:10.1145/1577824.1577831. 

[22] Grigore Ro§u & Traian Elorin §erbanu(a (2010): An Overview of the K Semantic Framework. Journal of 
Logic and Algebraic Programming 79(6), pp. 397-434, doi: 10.1016/j . j lap. 2010.03.012. 

[23] Carsten Schiirmann (2009): The Twelf Proof Assistant. In: Twenty-Second International Confer¬ 
ence on Theorem Proving in Higher Order Logics, LNCS 5674, Springer, pp. 79-83, doi:10.1007/ 
978-3-642-03359-9_7. 

[24] Peter Sewell, Erancesco Zappa Nardelli, Scott Owens, Gilles Peskine, Thomas Ridge, Susmit Sarkar & Rok 
Strnisa (2010): Ott: Effective Tool Support for the Working Semanticist. Journal of Functional Programming 
20(1), pp. 71-122, doi:10.1017/S0956796809990293. 

[25] Geoff Sutcliffe (2009): The TPTP Problem Library and Associated Infrastructure. Journal of Automated 
Reasoning 43(4), pp. 337-362, doi:10.1007/sl0817-009-9143-8. 

[26] Yuting Wang, Kaustuv Chaudhuri, Andrew Gacek & Gopalan Nadathur (2013): Reasoning About Higher- 
Order Relational Specifications. In: Fifteenth International ACM SIGPLAN Symposium on Principles and 
Practice of Declarative Programming, ACM Press, pp. 157-168, doi: 10.1145/2505879.2505889. 


